Skip to content

perf: defer timezonefinder & pandas imports to cut startup time#482

Open
mrosseel wants to merge 177 commits into
brickbots:mainfrom
mrosseel:lazy-imports
Open

perf: defer timezonefinder & pandas imports to cut startup time#482
mrosseel wants to merge 177 commits into
brickbots:mainfrom
mrosseel:lazy-imports

Conversation

@mrosseel

Copy link
Copy Markdown
Collaborator

Re-implements the salvageable part of #378 (boot-speed), cleanly on top of the nixos branch.

#378's perf work was lost during that branch's own internal merges — its HEAD no longer contains the lazy-import changes, and they were never merged upstream either. This brings back the part that still matters for startup time.

What this defers

  • timezonefinder (state.py): SharedState.__init__ eagerly imported timezonefinder and constructed TimezoneFinder() (which loads its dataset) at startup. It's now constructed lazily on the first set_location(), i.e. after boot.
  • pandas (plot.py): imported at module level. plot.py sits on the startup path (menu_structureUIChartplot), so this loaded pandas during boot. Deferred into the four functions that actually use it.

comets.py is intentionally left unchanged: its module-level from skyfield.data import mpc imports pandas regardless, so deferring pandas there is a no-op without also deferring skyfield (out of scope here).

Verification

  • Importing state.py no longer loads timezonefinder (confirmed at runtime).
  • plot.py's only module-level pandas reference is removed; skyfield.data.hipparcos already imports pandas lazily, and nothing else in plot.py's import chain pulls pandas.
  • py_compile clean on all touched files.

🤖 Generated with Claude Code

mrosseel and others added 30 commits February 4, 2026 19:02
- build.yml: single build + Cachix push + unstable channel updates
- release.yml: manual release workflow for stable/beta channels

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The SD image module provides filesystems, but toplevel builds need
a minimal stub to evaluate successfully.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Required for NixOS module system to accept devMode setting.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Required when module has both options and config sections.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replaces FIXME placeholders with actual SRI hashes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Uses Pi5 runner when RUNNER_LABELS variable is set, falls back to
ubuntu with QEMU emulation otherwise.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Filter to only Pi 4B device tree (CM4 incompatible with our overlays)
- Use shorthand DTS syntax for PWM overlay

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Untracked file was excluded from Nix flake source tree, causing
"No module named 'PiFinder.sys_utils_base'" on SD card boot.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add camera overlay (imx477) to netboot config.txt via flake.nix
- Fix sys_utils import in main.py to use utils.get_sys_utils()
- Add hip_main.dat fetch to pifinder-src.nix for starfield plotting
- Add dma_heap udev rule for libcamera/picamera2 access
- Fix shared memory naming in solver.py (remove leading /)
- Add DNS nameservers for netboot environment
- Document power control scripts in CLAUDE.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add runtimeCameraSelection option to hardware.nix (default: true)
- SD image includes config.txt with "include camera.txt" directive
- Users can edit camera.txt and reboot to switch cameras
- Supported cameras: imx296, imx290 (imx462), imx477
- Fix cameraDriver scope in hardware.nix (moved to top-level let)
- Add sudoers rules for systemctl stop/start pifinder.service
- Add DMA heap udev rule for libcamera video group access
- Netboot config sets cameraType = "imx477" for HQ camera dev

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Refactor sys_utils modules to use common base class
- Add sys_utils_nixos.py for NixOS-specific implementations
- Add get_sys_utils() detection in utils.py for platform selection
- Add flake.lock for reproducible builds
- Add NetworkManager config to networking.nix
- Add deploy-image-to-nfs.sh for netboot development workflow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update build.yml CI workflow
- Fix fonts.py import
- Fix marking_menus.py formatting
- Add missing import to preview.py
- Simplify objects_db.py
- Add catalog_imports improvements
- Update pifinder_objects.db

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Switch to NFSv4 with caching disabled (noac, actimeo=0)
- Disable auto-optimise-store in devMode (hard links fail on NFS)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ServerAliveInterval/CountMax to prevent timeout during transfers
- Use rsync -R (relative) to preserve directory structure correctly

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comets.txt is downloaded at runtime and must be in a writable
location, not the read-only Nix store.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extend eth0 wait to 30 seconds with debug output
- Wait for link carrier before DHCP
- Add DHCP retries (3 attempts)
- Add LIBCAMERA_IPA_MODULE_PATH to pifinder service environment

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Restore SUBSYSTEM=="pwm" udev rule that was accidentally removed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Turns on keypad LEDs during sysinit for early visual boot feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- boot-splash.c: displays welcome image with scanning animation
- Starts at sysinit, stops when pifinder.service starts
- Much faster than Python splash

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove nixos-hardware module (saves 659MB linux-firmware)
- Fetch nixos-rebuild at runtime (saves ~500MB llvm/nix deps)
- Remove git from systemPackages (nix has built-in git for flakes)

Target: ~150MB vs current 1.7GB
- Remove default packages (vim, nano, etc)
- Disable polkit, udisks2, speechd
- Should reduce closure significantly
NetworkManager-vpnc alone has 1.1GB closure (webkitgtk, llvm, etc).
Disable all NM plugins for bootstrap - we just need WiFi.
github-actions Bot and others added 29 commits May 25, 2026 08:44
- nixos/RELEASE.md: document version flow + release/dev pipelines
- software.py: MIN_NIXOS_VERSION 2.5.0 → 3.0.0
- python-packages.nix: add pyerfa (used by calc_utils since upstream brickbots#423,
  silently dropped during upstream merge because requirements.txt is not
  mirrored into the Nix env)
- python-packages.nix: include hardwarePackages in devEnv so nix develop
  matches the runtime import surface
- python-packages.nix: select simplejpeg wheel by host arch (was hard-pinned
  to aarch64; failed to import on x86_64 dev shells)
- flake.nix: apply libcamera -Dpycamera=enabled overlay to the x86_64
  devShell and export PYTHONPATH so picamera2 finds the python bindings

Verified: nix develop --command python -c 'import …' on x86_64
succeeds for all 34 imports (erfa, picamera2, libcamera, PyHotKey,
pynput, hardware packages, etc.). RPi.GPIO still raises its own
"only on a Raspberry Pi" RuntimeError at import time — expected,
matches upstream pip behavior on non-Pi hardware.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. gpsd-add-uart: rename /dev/ttyAMA3 → /dev/ttyAMA1 (6 sites). The uart3
   overlay surfaces as ttyAMA1, matching hardware.nix's udev rule and the
   Debian image's gpsd.conf.

2. /etc/default/gpsd: drop the custom USBAUTO+GPSD_SOCKET pair, write
   upstream pi_config_files/gpsd.conf's three lines verbatim. DEVICES now
   opens the on-board UART at startup. gpsd-add-uart kept as the boot-time
   socket-activation kick; can retire after on-Pi confirmation.

3. pifinder-upgrade: replace fragile `nix build --dry-run | grep` progress
   with `nix --log-format internal-json build … --max-jobs 0` parsed by
   gawk, counting type=100 (actCopyPath) start/stop events. Stable across
   Nix ≥ 2.4. Validated against a real cache.nixos.org substitute (5/5).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build.yml was defaulting VERSION=2.5.0 on push triggers (and the
workflow_dispatch default also read 2.5.0), so this branch's auto-build
was publishing v2.5.0-migration tarballs while the migration branch's
downloader (software.py _MIGRATION_VERSION_INFO and the brickbots/PiFinder
release branch's migration_gate.json) points at v3.0.0-migration. Bump
both the workflow_dispatch default and the push-trigger fallback to 3.0.0
so a normal push to nixos publishes the artifact at the URL the migration
branch actually downloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Lint & Test workflow was using DeterminateSystems/magic-nix-cache-action,
which is backed by GitHub's Actions Cache and gets HTTP-418 rate-limited
under sustained traffic — exactly the failure mode that just broke
type-check ("--install-types failed: substituter disabled, rate limit
exceeded"). The Nix substituter is then disabled mid-run and dependent
commands like mypy --install-types fall over.

Replace it with cachix/cachix-action@v17 pointed at the pifinder cache
(read-only, no auth token needed). Same backing as build.yml, so dev-shell
substitutes hit the same store paths the system closure was built against.
cache.nixos.org remains the default fallback.

Also bump actions/checkout@v4 → @v6 in this file to align with the Node 24
migration in build.yml/release.yml.

This is a stop-gap. The real fix is standing up Attic with an S3 backend
so both build.yml and lint.yml can retire cachix.org and MNC together —
tracked separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rong)

Previous commit on this branch swapped DeterminateSystems/magic-nix-cache-action
for cachix/cachix-action@v17 thinking the MNC HTTP-418 rate-limit was the
root cause of the failed lint/type-check. That swap made things worse:
the pifinder Cachix only contains the NixOS *system closure*, not the
*dev shell* (cedar-detect-server's Rust crate builds). With MNC removed,
the dev shell had to rebuild from source, which fetched crate tarballs
from a crates.io mirror and hit 403s.

MNC was carrying real weight by caching locally-built derivations
between runs. Restoring it. The original MNC rate-limit was a transient
flake — re-runs work around it. Real fix is standing up Attic with
S3-backed storage so both build.yml and lint.yml can retire MNC and
cachix.org together.

The checkout@v4 → @v6 bump from the swap commit is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Nix derivation was overwriting pifinder-build.json with
"nix-${gitRev}" at build time, so even released devices reported a
random short-sha instead of the release version. Three writers became
two, with consistent semantics everywhere:

- pifinder-src.nix: drop the cat > pifinder-build.json block and the
  gitRev arg — the derivation now copies the source file through
  verbatim, no version invention.
- flake.nix: drop the pifinderGitRev _module.args plumbing.
- services.nix: drop pifinderGitRev / gitRev from the pifinder-src
  import.
- release.yml: reorder so the version stamp is written into the
  working tree BEFORE the nix build (so the store path bakes in the
  release version, not the previous stamp), then re-stamp with the
  resulting store_path after the build, commit, push, tag.

Result: SD image, cachix closure, and committed JSON all agree on
the released version. Matches the flow already documented in
nixos/RELEASE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the decision to self-host Attic at cache.pifinder.eu, backed by
SQLite + local disk initially with Cloudflare R2 as the eventual chunk
store. Covers considered alternatives (cachix.org, Magic Nix Cache,
nix-casync, harmonia) and the operational consequences for CI publishing,
on-device updates, and failure fall-through to cache.nixos.org.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces pifinder.cachix.org with the self-hosted Attic instance at
https://cache.pifinder.eu/pifinder (ADR 0004) on both the device-side
substituter list and the CI publish path. cachix.org is removed
entirely with no fallback, so this is the first build that proves
Attic stands on its own.

services.nix:
  substituters       = ["https://cache.pifinder.eu/pifinder"
                        "https://cache.nixos.org"]
  trusted-public-keys = ["pifinder:8UU/O3oLkaJHHUyqEcPGl+9F1m4MqDca39Ewl49jBmE="
                         "cache.nixos.org-1:..."]
  (pifinder.cachix.org and its key removed.)

build.yml — build-native, build-emulated, build-migration-tarball:
  - remove cachix/cachix-action steps
  - remove `cachix push` (replaced by `attic push pifinder:pifinder`)
  - add a "Setup Attic substituter" step that runs
      nix profile install nixpkgs#attic-client
      attic login pifinder https://cache.pifinder.eu \"\$ATTIC_TOKEN\"
      attic use pifinder:pifinder
    before the build, so the build itself substitutes from
    cache.pifinder.eu.
  build-emulated swaps cachix/install-nix-action for
  DeterminateSystems/nix-installer-action — no cachix dependencies left.

First post-cachix build is expected to be slow: pifinder.cachix.org's
warmed-up paths are gone. Once it completes and `attic push` lands,
subsequent builds substitute from cache.pifinder.eu.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cceed)

Brickbots/PiFinder runs the workflow on pull_request:synchronize from
PR brickbots#379 (mrosseel:nixos -> brickbots:main). With ATTIC_TOKEN now also
set on brickbots, build-emulated's 'Push to Attic' step succeeds — the
last failing step. But stamp-build then tries to checkout the PR head
ref (mrosseel:nixos) and 'git push' a pifinder-build.json commit there,
which can't work from brickbots' Actions runner (no write access to the
fork). The PR run therefore failed at the stamp step even after attic
was wired correctly.

Gating stamp-build on github.event_name == 'push' keeps the canonical
stamp on the mrosseel:nixos push run (where it works) and skips it on
brickbots PR runs (which only need to verify the build).

Net effect: both repos' CI runs in parallel without stomping —
- Both build and push the same closure to cache.pifinder.eu (attic
  FastCDC-dedups, so the second push is a no-op),
- Only mrosseel stamps pifinder-build.json,
- build-migration-tarball already gates on github.ref == refs/heads/nixos
  so it only runs on mrosseel push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sync the fork with upstream up to 0e0ec03 (39 commits since the last
sync at e4b623a): typed positioning model (PointingEstimate / ImuSample
/ AlignedResult), polar alignment, comet vectorization, single-instance
lock, resolution-flexible UI, i18n fonts.

Conflict resolution (18 files):
- Adopt upstream's typed positioning model across solver, integrator,
  state, imu_pi, status, main, base, console, object_details, fonts,
  menu_structure, auto_exposure, camera_interface.
- Keep NixOS layers: software.py (store-path upgrade UI), utils.py
  (build_json, writable comet_file, robust pifinder_dir), sys_utils.py.
- plot.py: restore top-level `import pandas` (upstream added module-level
  uses; the fork had made it lazy) and drop the redundant lazy imports.
- solver.py taken verbatim from upstream; the cedar-detect dev-spawn is
  proposed upstream separately (brickbots#478).
- Drop fork-deleted nox/pip tooling (requirements.txt, version.txt).

deps: add xlrd to python-packages.nix (pyerfa already present).

Verification: ruff clean; 479 unit tests pass. Remaining failures are not
from this merge -- 6 test_software + 4 test_t9_search pre-exist on
origin/nixos; 5 test_comets fail on skyfield 1.53 (upstream vectorization,
has a runtime fallback).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Skyfield's propagate() lays a batched Kepler orbit out as
(3, #orbits, #times) but sets output_shape = (3,) + t1.shape, so a
batched orbit propagated to a single scalar time raises "cannot reshape
array of size 3N into shape (3,)" on skyfield >= 1.46. (The fork uses
nixpkgs' skyfield 1.53; upstream pins 1.45, which tolerated it.)

Give every comet the same target time as an (N, 1) column so output_shape
matches the (3, N, 1) result, then squeeze the time axis. Verified against
the per-comet path (0 AU difference) and the existing tests/test_comets.py
oracle (7/7 pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…che)

The upstream merge auto-merged catalogs.py entirely to the fork's side,
silently dropping upstream's T9 search (brickbots#464: KEYPAD_DIGIT_TO_CHARS,
search_by_t9) and the catalog disk cache; test_t9_search (an upstream
test) failed as a result. Take upstream's catalogs.py wholesale, matching
the upstream main.py and menu_structure the merge already adopted. Trade-
off: drops the fork's priority-fast-path / background-loader in favour of
upstream's cache + loader (the agreed "take upstream catalogs" choice).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MIN_NIXOS_VERSION was intentionally bumped 2.5.0 -> 3.0.0 (d705057, prep
for 3.0), but test_software still asserted 2.5.x/2.6.x as qualifying.
Shift the qualifying mock releases and _meets_min_version cases into the
3.x line so the suite matches the current minimum; below-min (2.4.0) and
draft cases are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
main.py created (and server.py switched) the pifinder_logconf.json symlink
relative to the cwd, which on NixOS is the read-only /nix/store -> the
service crash-looped at startup (OSError: Read-only file system).

Follow the same split config.json already uses: the logconf presets stay
read-only in the source tree (utils.pifinder_dir/python/logconf_*.json) and
the active selection is persisted as a bare filename in the writable data
dir (PiFinder_data/log_config), resolved via utils.active_logconf_path().
Storing the name (not a store-path symlink) keeps the choice valid across
upgrades (which GC old store paths) and reboots.

No NixOS workaround needed; also removes the write-to-source-tree
antipattern on Raspberry Pi OS (upstreamable).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The upstream merge kept nixos's outer tetra3_dir (python/PiFinder/tetra3)
while taking upstream's solver.py, which built the DB path as
tetra3_dir/data/default_database.npz. But the submodule nests the package
at tetra3/tetra3, so the DB is actually at tetra3/tetra3/data -> the solver
crashed at startup with FileNotFoundError.

Load it by its canonical name, Tetra3("default_database"); tetra3 resolves
the bundled DB from its own package data dir regardless of the inner/outer
layout. Validated on-device (loads from python/tetra3/data).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
release.yml now installs Nix + Attic like build.yml and pushes the release closure to a dedicated, never-GC'd `pifinder-release` cache. Devices and the migration first-boot trust it ahead of the dev `pifinder` cache and cache.nixos.org. Removes the last Cachix usage from active config.

Docs (RELEASE.md, ADR 0004, NIXOS_STATUS.md) document the dev-vs-retained-release two-cache split and the per-cache retention caveat. The pifinder-release trusted-public-key is a placeholder until the cache is bootstrapped server-side.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 6 .direnv/ nix-direnv cache files were tracked but regenerate on
every direnv reload, so rebases baked divergent copies into the nixos
stack. Gitignore + untrack stops that churn going forward.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…erivations

Replace the 524-line nixos/pkgs/python-packages.nix (26 manually packaged PyPI deps, each with hand-chased hashes and build patches) with a uv-managed workspace realized into the Nix store via uv2nix.

Changes:
* Deps declared in python/pyproject.toml, pinned in python/uv.lock (117 pkgs)
* nixos/pkgs/uv-python.nix builds the runtime/dev virtualenvs; the 5 native packages (python-libinput, python-prctl, python-pam, dbus-python, pygobject) keep their build patches as uv2nix overrides
* flake.nix: add pyproject-nix/uv2nix/pyproject-build-systems inputs, thread via specialArgs, devShell uses the uv2nix devEnv
* libcamera Python bindings stay a Nix overlay (not on PyPI)

All four nixosConfigurations + the devShell evaluate; the aarch64 build is to be validated by CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cache

DeterminateSystems/magic-nix-cache-action now returns HTTP 418 and the
GitHub Actions cache rate-limits it (Twirp ResourceExhausted, "rate limit
exceeded"), so `nix develop` cannot fetch the dev environment and every
lint/test/type-check job fails before ruff/pytest/mypy even run — which all
testable PRs inherit.

Mirror build.yml/release.yml and substitute from the self-hosted Attic cache
cache.pifinder.eu (ADR 0004) instead, falling back to cache.nixos.org when
ATTIC_TOKEN is unavailable (e.g. fork PRs) so the job never hard-fails.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
state.py imported timezonefinder and built a TimezoneFinder() in SharedState.__init__, and plot.py imported pandas at module level. Both load during boot — SharedState is constructed at startup, and plot.py is pulled in via menu_structure -> UIChart. Defer the TimezoneFinder construction to the first set_location(), and pandas to the plot functions that use it, so neither blocks startup.

comets.py is intentionally left unchanged: its module-level 'from skyfield.data import mpc' imports pandas regardless, so deferring pandas there has no effect without also deferring skyfield.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mrosseel mrosseel added the testable Ready for testing via PiFinder software update label Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testable Ready for testing via PiFinder software update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant